Notes From Dataset Source

(Data was imported from https://opendatakingston.cityofkingston.ca/explore/dataset/neighbourhood-census-profiles-income-occupation-education.)

“The Community Census Profiles are based on custom tabulations generated by Statistics Canada from the 2016 Population of Census for the CCSD Community Data Program.

The community profiles contain data from 2016 Census and long form program. The 2016 census data is considered to be of good quality and general comparisons can be made with similar data from previous years. Direct comparisons cannot be made between Statistics Canada’s 2016 Long Form data and the 2011 National Household Survey (NHS).

The figures shown in the tables and charts have been subjected to a confidentiality procedure known as random rounding to prevent the possibility of associating statistical data with any identifiable individual. Under this method, all figures, including totals and margins, are randomly rounded either up or down to a multiple of “5”, and in some cases “10”. While providing strong protection against disclosure, this technique does not add significant error to the data. The user should be aware that totals and margins are rounded independently of the cell data so that some differences between these and the sum of rounded cell data may exist. Also, minor differences can be expected in corresponding totals and cell values among various census tabulations.

Statistics Canada is committed to protect the privacy of all Canadians and the confidentiality of the data they provide. As part of this commitment, some population counts of geographic areas are adjusted in order to ensure confidentiality.”

Implications: Provided data is reliable but dated, thus can serve as preliminary analysis for initial study but inconclusive until cross validated with more recent data.

Data Exploration From Microsoft Excel

Prevalence of Low Income

According to Statistics Canada (https://www.ontario.ca/document/2016-census-highlights/fact-sheet-7-income), the below image indicates that the prevalence of low income based on LIM-AT (Low Income Measure - After Tax) was 14.0 in Ontario and 14.4 in the rest of Canada according to the 2016 census. This figure leads to determine a KPI of 14% as a maximum threshold for prevalence of low income based on LIM-AT in Kingston to target that achieved as a whole by Ontario as a province and/ or Canada as a country.

The below chart was produced using Microsoft Excel and shows that Kingston performs relatively poorly with prevalence of low income based on LIM-AT in 2015 compared to this determined threshold.

Unemployment Rate

According to an external source (https://www.investmentmonitor.ai/fdi-drivers/why-low-unemployment-rates-are-a-bad-thing/), “although there is no exact target for unemployment, most economists argue a rate between 3% and 5% is acceptable.” This leads to determine a KPI of 5% as a reasonable maximum threshold for a healthy unemployment rate.

The below chart was produced using Microsoft Excel and shows that Kingston performs significantly poorly with unemployment rate compared to this determined threshold in 2015.

Commuting Duration

The below chart was produced using Microsoft Excel and shows that Kingston’s employed labour force mostly experience low commuting time in 2015.

Data Cleaning For R Analysis

  1. Printed structure of the dataset to check and correct for any variable type errors. None found.

  2. Checked for any nulls in the dataset to either remove or substitute with proxy data. None found.

  3. Formatted dataset so that counts are converted to percentages because it would not be reasonable to compare counts across neighbourhoods that are variably sized.

  4. Reduced dataset keeping only relevant columns for analysis to reduce computing time and dimensional complexity.

  5. Encoded column variables (feature names) to produce sizeable and cleaner visuals.

Data Analysis in R

Correlation Matrix Heat Map

A correlation matrix is a useful tool to help spot interesting relationships between variables in a fast and intuitive way. However, it must be noted that correlation does not imply causation. To identify true causal relationships, one must conduct experiments.

The figure below shows the correlation matrix viewed as a heat map to easily spot any strong correlations within all possible variable pairs in a single visualization.

## LEGEND:
## 1: Median after-tax income of households in 2015 (dollars) 
## 2: Prevalence of low income in 2015 based on after-tax low-income measure (percent) 
## 3: 0-9,999 - Household after-tax income groups in 2015 for private households (dollars) 
## 4: 10,000-19,999 - Household after-tax income groups in 2015 for private households (dollars) 
## 5: 20,000-29,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 6: 30,000-39,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 7: 40,000-49,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 8: 50,000-59,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 9: 60,000-69,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 10: 70,000-79,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 11: 80,000-89,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 12: 90,000-99,000 - Household after-tax income groups in 2015 for private households (dollars) 
## 13: 100,000 and over - Household after-tax income groups in 2015 for private households (dollars) 
## 14: Employed - Population aged 15 years and over by Labour force status 
## 15: Unemployed - Population aged 15 years and over by Labour force status 
## 16: Not in labour force - Population aged 15 years and over by Labour force status 
## 17: Unemployment rate 
## 18: Management occupations - Labour force population by occupation (NOC) 
## 19: Business, finance and administration occupations - Labour force population by occupation (NOC) 
## 20: Natural and applied sciences and related occupations - Labour force population by occupation (NOC) 
## 21: Health occupations - Labour force population by occupation (NOC) 
## 22: Occupations in education, law and social, community and government services - Labour force population by occupation (NOC) 
## 23: Occupations in art, culture, recreation and sport - Labour force population by occupation (NOC) 
## 24: Sales and service occupations - Labour force population by occupation (NOC) 
## 25: Trades, transport and equipment operators and related occupations - Labour force population by occupation (NOC) 
## 26: Natural resources, agriculture and related production occupations - Labour force population by occupation (NOC) 
## 27: Occupations in manufacturing and utilities - Labour force population by occupation (NOC) 
## 28: Car, truck or van as a driver - Main mode of commuting for employed labour force 
## 29: Car, truck or van as a passenger - Main mode of commuting for employed labour force 
## 30: Public transit - Main mode of commuting for employed labour force 
## 31: Walked - Main mode of commuting for employed labour force 
## 32: Bicycle - Main mode of commuting for employed labour force 
## 33: Less than 15 minutes - Commuting duration for the employed labour force 
## 34: 15 to 29 minutes - Commuting duration for the employed labour force 
## 35: 30 to 44 minutes - Commuting duration for the employed labour force 
## 36: 45 to 59 minutes - Commuting duration for the employed labour force 
## 37: 60 minutes and over - Commuting duration for the employed labour force 
## 38: No certificate or degree - Highest certificate, diploma or degree 
## 39: High school - Highest certificate, diploma or degree 
## 40: Apprentice - Highest certificate, diploma or degree 
## 41: College - Highest certificate, diploma or degree 
## 42: University - Highest certificate, diploma or degree

Section 1: Why Some Neighbourhoods Are The Richest/ Poorest

The figure below shows that poorer neighbourhoods correlate with higher unemployment rates.

The figure below shows that amongst the employed labour force, sales and service occupations may correlate with highly underpaid incomes. Linear regression is used so that we can quickly model the linear effects of the relationship between variables of interest and understand the strength of that relationship using statistical measurement R^2 (indicates how much variability is explained by the model).

The figure below shows that there is no correlation between household incomes and population percentage of highschool-only graduates (the apparent correlation here seems to be largely driven by a single outlier). This may indicate that most highschool-only graduates are not in the labour force but are actively attending higher education (else a significant impact would otherwise be observed in relation to the prevalence of low income, either due to prevalence of underpaid jobs or being unemployed), and thus the ability to afford higher education may not be the primary determining factor for household incomes.

Section 2: Is There Enough Diversity in Occupations For New Grads?

Given that the survey represents 10 occupations in Kingston, a KPI of 10% as a minimum threshold for % labour force population per occupation can be estimated assuming the ideal value to be 100% divided equally among the 10 occupations to yield 10% of labour force population split per occupation.

Using a 95% confidence interval and the determined minimum threshold of 10%, the density distribution ridgeline visualization in the figure below shows that the following four occupations are fairly low in opportunity:

  • Natural and applied sciences and related occupations

  • Occupations in art, culture, recreation and sport

  • Natural resources, agriculture and related production occupations

  • Occupations in manufacturing and utilities

It’s also worth noting that the sales and services occupation has the highest mean labour force population. Together with their apparent influence on the percentage of highly underpaid households as shown earlier in Section 1, this may imply that a higher minimum occupational pay within sales and services could potentially see a positive shift in the prevalence of low income.

Key Takeaway

Kingston’s primary underlying opportunities for development may be less of being able to afford education and more of being able to find (1) more employment opportunities, especially within the occupations of choice that are less common, and, (2) higher minimum income within other occupations that are more commonly available, especially within sales and services because it is also the most prominent occupation of the labour force.